You may use these incident response templates and scenarios internal to your business and team. You do not have permission to claim ownership, re-publish this material to the public, or sell it. All text is original work by Kieri Solutions LLC: V. Amira Armond.
Table of contents
1. What is Incident Response in Cybersecurity?
Incident Response is the operations part of Cybersecurity. It is about responding to problems in real time. In a lot of ways, an incident is like a helpdesk call. There is some trigger, such as a user complaining about strange network behavior or an automated alert from the security system. The cybersecurity technician starts a process to record the symptom and gather information. If it is a routine concern, the technician would coordinate a fix, document it, and close the incident. If the problem is serious, or could be serious, the technician would escalate the problem through pre-defined steps to get the right people involved.
Example of a routine incident (in a large company)
Jim checks the daily antivirus report and finds that workstation BOSTON0094 has been infected with a virus. He starts a ticket, copies details into it, establishes a remote connection to the workstation’s network port and puts it into quarantine. Jim performs research and finds that the virus does not have any remote control or data export features, so there is no need to escalate the level of the incident. He then dispatches a local technician to re-image the workstation. The local technician talks to the workstation owner, picks it up, verifies that critical data has been saved, and re-installs the standard company build on it. The workstation is returned to the owner and the ticket is closed.
Example of a serious incident (in a large company)
During routine maintenance of the database server, Jim noticed a new administrator account that had been created a week ago. It didn’t belong to anyone in the database team, and could be a sign of a security breach. He starts a ticket and calls his CISO. The CISO evaluates the situation and decides to contact a specialized cyber-crime consultant for assistance. Over the next few days, they find evidence that an attacker had compromised a workstation then moved laterally through the network. The attacker had uploaded 20 TB of sensitive data from the network. Many parties had to get involved: the IT department, the CISO, the CEO, the legal team, the outside cyber-crime consultant, the FBI, the state police, the cyber-insurance company, and the public relations team, to name a few. The incident took several months to resolve, and caused an impact to the company’s reputation and finances.
2. Tools and templates for a cyber security incident
Before an incident, make sure you have these vital tools, templates, and information used during cyber-security incident response:
Cyber-security incident response policy
This document describes the types of incidents that could impact your company, who the responsible parties are, and the steps to take to resolve each type of incident. It should be customized for your company. For example, a federal contractor should address the risk of sensitive schematics being stolen which could impact the security of the United States. Medical facilities should address the risk of PHI data breach, and patient notifications. Etc.
The incident response policy should have an escalation list so that the correct people are engaged. It should also have timelines for response and communication actions. For example, a minor incident may only involve the help desk. A major incident might need to be communicated to the officers of the company, public relations department, legal firm, and outside law enforcement within a few hours of discovery.
The incident response policy should also discuss procedures for isolating compromised systems, recording evidence, and maintaining a clear chain-of-custody.
Business Continuity Plan (BCP) or Disaster Recovery Plan (DRP)
The BCP or DRP may be utilized during an incident, especially if the incident causes an outage or requires restoration of services. For example, if a server is infected with ransomware, you will need to 1) Rebuild or fail-over services. 2) Restore from backup. If all of your servers are infected with ransomware, like what happened to the City of Atlanta in early 2018, then the level of complexity goes up. You definitely want to have a printed-out plan with tested procedures, configurations, and recovery materials ready to go.
Remember that incidents are not just hackers and viruses. A cyber incident could also be an approaching hurricane, a failed hardware device, or a power outage.
Not a fan of writing? Kieri Solutions can help you create your business continuity plan or disaster recovery plan. We specialize in RESILIENT IT. Check out this page for more information.
As part of your backups, you should have saved configuration files for each of your routers, firewalls, switches, and other network devices. There are a few reasons for this. If a network device fails, you can normally get a replacement within a few hours. If you have a saved configuration file, you just upload it and go. If not, you might be stuck troubleshooting VPNs, firewall rules, and VLANs, and changing settings as your users complain. Not a good situation. Having time-stamped configuration files also helps during forensics and incident investigation – if you see a strange rule (such as a new VPN to a foreign country’s IP) – it would be helpful to find out how long ago the rule was added, and whether it was one of your technicians or an unknown source.
You should also have baseline configurations saved for your user workstations. At a high level, this means having a list of software that has been approved for use on the network. At a granular level, this means running scripts occasionally to pull a list of all software installed, hardware devices, firmware versions, and drivers.
Ideally, as part of your defensive posture, your organization should routinely check for changes in the baselines. A new software program on one workstation could indicate spyware. Changes to firewall rules could indicate an internal threat.
You should have a timestamped ticketing system, or at least a way for multiple people to record data about complex issues. This helps organizations keep track of complex problems without losing them or forgetting who is assigned what task. This removes a lot of second-guessing during investigations since each entry is time-stamped and marked with the author’s name.
Many intrusion detection systems include ticketing capabilities. One example is AlienVault.
Before an incident occurs, make sure you are gathering audit logs. Because attackers sometimes destroy logs to hide evidence, your logs should be sent to a central log (or syslog) aggregation system in near real-time. Examples of these systems are Splunk, AlienVault, and VMWare Log Insight. Ideally, this log system will be firewalled from the rest of the network and only a few very trusted people have access to it. Each of your servers and network devices should be set to forward event logs to this device. Almost every operating system and network device in use today has an option to send “syslogs” to another server.
Incident Response Templates
These templates help remind organizations to gather additional information about security incidents. They also include a workflow, which should match the company’s Cybersecurity Incident Response Policy. Make sure to download and customize your incident response templates BEFORE an incident.
How can you prepare your team for an incident?
The best way to prepare is to run regular incident response drills. I recommend the following schedule:
- Informal IT-department round table to discuss a random incident scenario and discover gaps or concerns.
- Perform a test restore of at least one system.
- Technical review of procedures and policies for accuracy.
- Verify contacts list is still valid and contract numbers / policies / warranties are good.
- Tabletop drill that follows a complex scenario through to the end. Each escalation point is expected to participate in the drill.
- Perform a disaster recovery fail-over and fail-back, or relocation drill.
What is an informal round table?
An example would be the CISO randomly selecting scenario #4 from this document:
4. A man from “Linux” walks past the reception desk and searches for a open network port to plug his laptop into. What security procedures are in place to prevent him from accessing your network?
During the weekly team meeting, the CISO reads this scenario to her team. There is a 5-10 minute discussion about the topic. An IT department might come up with the following gaps:
- They have seen unannounced salespeople make it all the way to back offices in the past. There should be a procedure for the receptionist to record visitor IDs and have someone escort them.
- There are network ports in all offices which provide DHCP and LAN access to any device plugged in. The corporate switches have the ability to perform MAC filtering, automatic VLAN quarantining, and logically turn off ports, but they are not configured for this yet. The department starts a project to enable these security configurations.
- The server room door is near the front desk and is normally propped open due to ventilation issues. The CISO puts in a call to get the ventilation problems fixed so that the server room door can be secured.
Obviously, coming up with three new security projects each week will crush most IT departments. The CISO needs to prioritize by efficiency. Over time, this brainstorm process will greatly improve cyber security for an organization.
What does a test restore involve?
Here is a little known fact about IT: sometimes backups say they are successful, but when you try to restore from them, the backups don’t work. What the heck!!
This actually isn’t rare. It happens a lot.
You should never assume that your backups work unless you’ve successfully restored from them.
Backups are most likely to fail on database servers. This is because databases normally have multiple files in use, which need to be perfectly synchronized to each other. Without special configuration, backups will copy the first file, then copy the second file, then the third, etc. The files don’t match each other in the backup because they were copied at different times (even half a second will break their synchronization). Pro tip: Use the database’s embedded backup program to create backups to a file on the server, then back up that file.
The restore process can also be a huge problem if not properly prepared for. For example, if you try to restore an operating system backup to different hardware, it will almost always crash because the motherboard hardware and CPU are different. You will either need to remove hardware-specific drivers during the restore process or you will need to have the materials ready to install the operating system and programs separately. It is good to know what to expect beforehand.
And there are always little gotchas. Like your restore process uses a DVD but your servers don’t have DVD drives. Without actually trying to perform a test restore, you won’t catch the little problems.
If you are not careful, your test restore could overwrite your real server, or it could disrupt services by putting a second, conflicting, server on the network. Be careful. Choose your recovery target carefully and isolate the restored server by disconnecting the network. If you plan, there should be no risk of impact.
What is a technical review of procedures?
Even for experienced technical writers and engineers, it is REALLY hard to capture every step in a procedure on the first try.
So the document might be wrong to start with, or it might address scenario A well, but not scenario B.
Over time, changes to the environment will also make procedures incorrect. For example, an update of the backup software might change the menu options or add more steps to the recovery wizard.
The best way to handle this is to have your procedures document open when you are drilling an incident response or restore, and make a note whenever a step is wrong. Then follow-up regularly to update the main document and send it to the team.
What do you mean, verify contacts and contract numbers?
Your organization should have someone who is responsible for vendor support contracts and escalation contacts. These contacts should include the following:
- How to reach corporate officers during an emergency (cell phones, etc)
- Internal POCs for each department, especially compliance, PR and IT.
- Local police non-emergency #
- FBI cyber-crime reporting #
- State cyber-crime reporting #
- HIPAA, GDPR, PCI, and other regulatory contacts (if applicable)
- Cyber-security consultant on retainer (if applicable)
- Law firm on retainer (if applicable)
- General insurance for fire, natural disaster, etc (include policy #s, info, limits)
- Cyber insurance for data loss, cybersecurity incidents, liability (include policy #s, info, limits)
- Data center POCs
- Branch office POCs
- IT system vendors and support contracts (include warranties, contract #s, and how to start a support call)
The best way to verify that a contact works is to CONTACT them. Put in a help-desk call, talk to the receptionist at the law office, etc. For the crime reporting numbers, regularly check the official websites for changes.
What is a tabletop drill?
Tabletop drills are where you get the PEOPLE and the PROCEDURES together to work through a detailed scenario, but you make no changes to the network or systems.
You want to test each person for responsiveness and to see whether they know their role. You want to test your procedures to make sure they are helpful and don’t have obvious gaps.
A non-participant should organize the drill, take notes, and record the time of each action. Every attempt should be taken to make this drill realistic from a logical point of view.
For example, the organizer might call in to the help desk to report “their computer is acting strangely” – and that this is a drill. They would note how long it took for a technician to diagnose the “problem” and escalate to the security team. They could ask whether the computer has a working antivirus, whether it was recently scanned by the intrusion detection system, or whether the baseline has changed.
As the tabletop drill progresses, the organizer would watch to see if anyone ‘drops the ball’ or fails to escalate properly. They want each person to test normal communication methods for this drill – for example, if the security POC is on vacation, would they call her cell? Do they have the ability to notify customers of an outage? Will the cyber-security consultant answer within an hour? They also want to see how people use procedures and policy during the drill. Did someone use an out-of-date document? Did the procedures tell them to do something stupid? Did the person bypass policy? Pro tip: If you are pulled into a tabletop drill, it is hard to go wrong if you follow the relevant policies and procedures documents.
After the tabletop drill, there are normally many lessons learned and improvements identified. By doing these drills regularly, the entire organization prepares to handle major incidents.
What is a fail-over?
Failing over means moving your organization’s critical services to a different system and/or location. It is similar to testing a restore, but is “for real” – your users are using the services after they are moved.
Done right, there is almost no impact to users. A good fail-over should let an IT department sleep easy knowing that catastrophes will not hurt the organization.
Done wrong, a fail-over can take down critical services. For this reason, testing a fail-over should be done after-hours and system checks should be performed beforehand to verify that the system is ready for use.
Not every organization can fail over. It requires a large investment to purchase secondary systems and engineer the replication scripts. If you don’t have this capability, then perform the next best thing: test your ability to relocate.
What is a relocation drill?
This drill tests your ability to continue operations after a natural disaster, power outage, fire, or similar event. It can be as high-tech or low-tech as needed. Generally, you won’t be able to perform these actions for real, because your users would be impacted, but you want to simulate the real experience as much as possible.
Here is an example of a medium size business doing a relocation drill:
Scenario chosen: Category 5 hurricane causes massive flooding of corporate office. It will be unsuitable for human habitation for at least two months.
The IT department follows their business continuity plan. They have a goal of restoring the 3 most critical services within 12 hours. The IT department contacts their contingency location (another company in a nearby state) and loads spare network equipment and backup drives into a truck. They drive to the contingency location, verify access, and set up their equipment. Network configuration, loading software onto systems, and restoring backups is next. Then they test the ability of a ‘regular user’ to follow instructions to connect to the new servers.
I guarantee that whenever an IT department does a drill like this, they will learn a lot about what to do, and what not to do. You might find that it is worth the expense to stage more servers at a contingency location, rather than take the risk of not being able to buy new servers during a disaster.
Note: I know of an organization that had a data center in New Orleans during Katrina. It took them a month to restore critical services in a different state. It took them more than a year to get their data center back in New Orleans. This scenario really happens.
3. Free Incident Response Threat Scenarios, Questions, and Training Drills
Favorite this page so you can practice scenarios weekly with your team. More scenarios will be added over time.
- The first hard drive on your database server failed three days ago. The second hard drive will fail today. Would your staff have found and replaced the first drive before today? Would you have data loss if not? How would you handle the hardware replacement? How would you recover from data loss?
- The UPS in your main rack fails and all servers in that rack go offline. How long would it take to respond? How would you restore power? How would you verify that services are functional?
- A lady from “Microsoft” comes to the reception desk and asks for directions to the server room. What security procedures are in place to prevent her from physically accessing your servers?
- A man from “Linux” walks past the reception desk and searches for a open network port to plug his laptop into. What security procedures are in place to prevent him from accessing your network?
- A user from operations mistakes the corporate file share for their personal files, and deletes everything they are allowed to delete. Did they have too much access? What server(s) were involved? How long would it take to recover?
- One of your IT staff (pick randomly) uses a terrible password for all of their administrator accounts. “P@ssw0rd1!” is their favorite. Any combinations of “admin” “root” “administrator” “system” and this password will be compromised. Were any systems compromised? Does any system use well-known admin passwords like blank, default, “password”, “Pa$$w0rd1!”, “123456”, or “1qaz@WSX1qaz@WSX”? Does your team have a procedure to make sure that terrible passwords are not used?
- Pick a user who left the company between 3 and 6 months ago who had remote access. They will decide to access the corporate network using VPN or terminal services today. Will they succeed? Are all of their accounts disabled properly? Does your department have a procedure for regularly reviewing user accounts?
- A criminal hacker will attempt to access your corporate network VPN using ports visible from the Internet. Over the next two days, using a script, they will try 20000 user name combinations (jsmith , smith, johnsmith) with the common password “Spring2018!”. Will they succeed? Would you department be alerted about this activity, or would they have to discover it using manual reviews? If manual, would your department detect this activity within a day?
- A category 5 hurricane is pointed directly toward your city. Waterways and low lying areas are expected to flood at 10 feet above normal water level. All non-essential staff are expected to stay at home for at least three days. Will your building flood? Is the building going to be totally dark with no power? What electronics and networking would be affected? Would anyone stay in the building to try to handle damage? Would you shut down and physically move equipment to higher ground? What would happen if the roof and windows leaked? Would you implement a disaster recovery plan ahead of time, or wait to see what happens?
- A category 5 hurricane is pointed 200 miles north of you. On the day of landfall, it takes a hard left and impacts your city unexpectedly. The power is out for three days X each mile from a power substation (if your nearest substation is next door, your power never goes out. If your nearest substation is two miles away, your power goes out for 6 days. If you don’t know where the substation is, your power is out for 9 days). Do you have generators? Can you get fuel delivered to last the full amount of time? Would your company implement the business continuity plan?
- Ransomware just infected every Windows 2008, 2008 R2, 2003, 2000 server, every Windows ME, XP, 7 workstation, and every Linux server running SMB. The devices were not infected if they were 1) unplugged from the network, or 2) un-routable from the workstation LANs. What services went down? Do you have backups for critical systems? Were your backups on SMB shares (and thus, got destroyed too?) How would you recover?
- One power supply on your file server went bad a week ago. It shows a red light in the rear of the rack, and an amber light in the front. The lead/manager for the IT department is in the hospital and cannot help. Would your team have detected the power supply failure? How would they get a replacement?
- Your database server is filling up its main hard drive with an out-of-control log. It hit 85% full this morning, and will fill up the drive and crash the server in three days. Would your department have caught the disk space issue before the server crashed?
- One of your human resources employees lost their company-issued laptop while traveling. Was any PII or PHI on the laptop? Was the data encrypted? Would a technically-competent criminal be able to break into the encryption (for example, the password was written on it)? Could the laptop be used to access the company network? How would you disable the access?
- The CEO lost her mobile phone while shopping. It is presumed stolen, and was possibly a targeted attack by a competitor. Would a technically-competent criminal be able to break into the phone to read emails and downloaded files? Could you disable or wipe the phone remotely?
- One of your servers suddenly went offline, causing a major outage. Looking at the logs, you see “root logged on” “root initiated a shutdown”. Does your department follow best practices for password management? 1) Each individual has their OWN, NAMED, account. 2) root / admin passwords are complex and stored in a safe – any access to the password is logged and the passwords are changed afterward.
- Your inbound mail server suddenly started getting millions of emails. It is a distributed denial of service. How would you contain this attack?
- Google @yourcompany.com . What email addresses are publicly available on the web? Is there enough information there to perform a spear phishing attack? For example, company directories with names, titles, department, phone number, and email addresses are a major vulnerability.
- Your most legacy web server has been compromised. It is now hosting websites in a foreign language and is probably serving malware. Is your most legacy web server fully patched? Is it using any default usernames or passwords? How would you restore it? Assuming that an administrative password was cracked, could you change all admin passwords without breaking anything? Are there open ports from your legacy web server into the rest of the network? How could you secure these better?
- Your payments database has been breached. Who would you report it to, and what are the time requirements? Do you have lawyers and public relations staff ready for this type of incident? Include internal escalations, external escalations, and end-users.
- One of your users will attempt to install DropBox on their corporate workstation and copy sensitive documents (including PHI) into their personal cloud storage. Is the installation prevented at the workstation? Do your firewalls prevent file transfer? Do you have data loss protection programs to detect and block PHI? If the user performs these actions while working through a VPN, would differences in security allow it?
- Your company will experience a Business Email Compromise today. An accountant will get an email from the CFO asking for the bank logon and password. Has your accounting department been trained on Business Email Compromise? Do they have a procedure to call or verify in-person before sending very sensitive information electronically? Do you have annual cyber security training for all staff that addresses BEC and other threats?
- Your newest IT department member wants to install software that hasn’t been purchased legally. Does your department have a policy against this? Is it part of annual or on-boarding training for your IT staff? Does your company perform audits to find and remove unexpected software?
- One of your user workstations has been compromised with remote control software. An attacker is trying to get into other systems by brute forcing passwords. As a result, common account names (default admin and root accounts as well as jsmith and .adm accounts) are being locked out across multiple systems. Do you have a procedure in place to see these failed logon attempts and lockouts in the logs? Do the accounts lock out permanently, never lock-out, or unlock automatically? (the most secure option is to lock out permanently, but many admin and root accounts will never lock because it can be used as a DOS attack).
- One of your DMZ servers has been compromised and an attacker is using it as their base of operations. The attacker is deleting all the Windows logs. If a normal administrator logged on and saw that logs were empty, would they consider it an incident? Are the logs being sent to a central aggregation system? Do you have a procedure to check if servers are sending the normal amount of logs to this system over time? Can you access historic logs from the last year on the central aggregation system, or from other sources?
- Your oldest linux server is going to crash today. You will need to restore from backup, or re-build it. Do you have instructions? Do you have the software? Do you have backups? Do you have vendor support for the hardware? Do you have vendor support for the software? Have you ever done a successful test restore?
- Your primary router to the Internet crashed because a hacker sent it malformed packets. How would you reset it, and how long would this take? If it continued to be attacked (crashing each time), how would you re-establish Internet services for your organization?
- Your administrator workstation was infected with a Remote Access Trojan 7 days ago without your knowledge. Does your admin workstation have ALLOW-ALL rules for outbound communications to the Internet across the firewall? During the last 7 days, did you log on with a domain administrator account or a root account that has privileges to the network, or do you have a “user” account and an “admin” account (this is more secure)? Do you have an intrusion detection agent installed on your workstation that might detect unusual network, hard drive, or process activity? When was the last time anyone checked the results from that agent? Are powershell scripts executed by your computer logged or monitored?
4. Computer Security Incident Report Template
Download link to Microsoft Word 2016 document: Cyber Security Incident Response Template.docx
(Below is a HTML version in case you are worried about opening Word Docs. Try copy-paste into Word, you should be able to capture the table formatting.)
Computer Incident Reporting Form
|Is this a drill?||Yes / No|
General Incident Information
|Date||Incident POC Name|
|Time||Incident POC Phone|
|Time Zone||Incident POC Email|
|Type of Incident||outage / malware / unauthorized access (outsider) / inappropriate access (insider) / espionage / data breach / other (describe)|
|Date, time, and time zone of first detection|
|List names and contact information for all persons involved in detection and initial investigation|
|How was incident detected?|
|What do you think happened?|
|List of systems involved.
Include location, system name, IP address, MAC, serial number, corporate ID.
|Where can supporting information be found?||(location of log files, time-stamps, screenshots, photographs, etc)|
|Could sensitive information have been accessed? Describe worst and best-case scenarios based on current knowledge.||Yes / No
PII / PHI / PCI / GDPR / Classified / Unclassified but Sensitive / Trade Secrets / Financial / Other (describe)
Worst case scenario:
Best case scenario:
|If yes, notify compliance officer, CISO, or another corporate officer immediately. Who was notified? When?|
|Were any immediate changes made in response to the incident? (such as disconnecting system or disabling accounts).
List time stamps for each change.
|Who authorized the changes?|
Chain of Custody
|Were original systems isolated for forensic review?|
|Were backups or other system-state copies created? Describe. When were they created?|
|Location of systems or copies|
|How are systems or copies protected from alteration?|
|List name and contact information for person who is responsible for safekeeping of systems or copies|
Data Breach Incident
|Refer to organization’s data breach procedures or policy. Name of document used:|
|Each day during incident, add any new findings about worst-case and best-case scenarios for the data breach. Do not delete prior days information.||# of persons affected
List categories of data compromised (names, socials, credit card numbers, passwords, schematics for X design, etc)
Was data copied outside of the organization?
What sources of data were compromised?
|Each day during incident, add any escalations or notifications that were performed.||(law enforcement) contacted on date/time
Notified ### customers of potential data breach on date/time
|Current status of incident?|
|Final root cause analysis|
|Date, time, and time-zone that incident started|
|Date, time, and time-zone that incident ended|
|Describe actions taken to resolve incident (if applicable). Who performed?|
|Describe containment and/or preventative actions (if applicable). Who performed?|
|Follow-up actions needed?
List responsible party.
5. FAQs about the template
Why the emphasis on escalation and recording when you first contacted someone?
If the compromise involved someone else’s data, equipment, or services, your organization is responsible for notifying them.
This statement is confusing to most people. How could a compromise of MY network involve someone else? Here are some examples:
- Your web server was compromised and an attacker downloaded all log files. These log files included credit card numbers and user registration for 30,000 customers. Your organization will need to notify credit card companies and the 30,000 customers about the breach.
- A human resources database was exported. Each of your employees are exposed to identity theft risk. Your organization will need to warn the current AND past employees about this breach.
- Your organization is a web hosting company. You found root access to several servers was compromised. These servers host 1,000+ websites that belong to about 500 customers. A hacker could have used the websites to spread malware, steal account usernames and passwords, modify billing data, or do other mayhem. After you notify your 500 customers, they might need to notify their customers if personal information was stolen.
- Your organization suffered a major compromise of almost all systems. Since your suppliers have network connections to monitor your inventory in real-time, they may have been compromised as well. You need to notify the other organizations, as well as any individuals who may have been affected.
- Finally, in all of these examples, your organization will probably need to announce the breach to their shareholders, to law enforcement, and to insurance companies.
Most countries have consumer protection laws regarding protection of personal information, financial accounts, and credit card information. These laws regulate that organizations MUST notify individuals if there has been a data breach or suspected data breach. Depending on the type of data involved, there are different requirements for how long a notification can be delayed.
Depending on the type of breach, you might have only 3 days to notify end-users. This means that your initial detection team may only have a few hours to start the escalation process so that your company can investigate, determine the full degree of breach, and start notifications.
What is chain of custody?
Chain of custody is a law enforcement term for making sure that evidence is kept acceptable for court.
In traditional terms, if a bullet casing was found at a crime, an officer would take a picture and place it into an evidence bag with as little disturbance as possible. The bag would be sealed, labeled with the crime, date, time, location. It would be kept in a secure facility with logs to show who accessed it and why.
In court, if you present evidence that hasn’t gone through this safeguarding process, the defense will say that the evidence has been tampered with or even planted. This could make it impossible to prosecute a criminal who has done damage to your organization. So it is important to have a process for chain of custody before your incident occurs.
Digital evidence presents unique challenges since it can change simply by being turned on or off.
Why do we create images of the compromised systems?
The act of restoring functionality will often destroy evidence in digital systems. For example, the best practice response to a workstation infected with malware is to re-image the workstation entirely. This removes the risk of a latent infection, but it also erases logs and usage history which could help identify where the malware came from.
Since organizations can’t afford to stop everything for a few weeks while investigating, a common practice is to create images of systems for forensic use then rebuild them to restore functionality.
What is operating memory and why is it important during a compromise?
Operating memory is the data held temporarily in RAM. All programs use RAM to perform processing and to hold information that is accessed often. Some of this information is logged to the hard drive, but most of it just disappears over time as newer data overwrites it.
For example, as you edit an Excel spreadsheet, everything in the spreadsheet and all of your recent changes are stored in RAM. If Excel suddenly crashed before you saved the document to your hard drive, you would still have a copy of your document in RAM until something else overwrites it, or you turn off your computer. If you turn off a device, all operating memory is cleared as the RAM loses electrical current. Opening and closing programs will also rapidly overwrite the data stored in RAM.
It is especially important to capture operating memory for network devices like routers and switches. This is because they usually have very limited long-term storage.
A computer forensics expert can often retrace the actions of a criminal by reviewing the operating memory of a compromised device.
How can you capture operating memory?
The first step is DO NOT TURN OFF THE DEVICE.
The second step is to make as few changes as possible to the device. Don’t open programs or start processes as this may overwrite critical data in RAM.
Next, contact a cyber-security forensics expert for help. If you are that expert, you work with hardware and software vendors to use their tools for capturing operating memory. For example, Microsoft operating systems have the option to run a “memory dump”. Cisco devices can perform ‘core dumps’. The techniques are specific to the system.
Why don’t we just shut down all compromised systems?
See the section about why Operating Memory is important. That is about 80% of the answer.
The other reason: shutting down compromised systems tells your attacker that you know about them. They might start erasing evidence, disconnect, or leave town. You need to consider whether it is worth staying online to gather more information, or whether you should stop the damage now.
What is an alternative to shutting down a compromised system?
Many organizations will disconnect the network cable(s) instead of shutting down. It is still possible for an advanced attacker to do damage – they may have put a logic bomb into the system, for example, but this is very rare.
Why does the template ask for worst-case and best-case scenarios?
Imagine this. You see an administrator account on your file server that shouldn’t exist. The best case scenario is that the hacker somehow got access to the file server without touching any other system in your entire network. Best case, they just looked at a few things then decided to log off, eat some twinkies, and change careers to become a priest. Probably not. But maybe.
The worst case scenario is that every system on your entire network has been breached, fully…