‘It is said that if you know your enemy and know yourself, you will not be imperiled in a hundred battles’. This rule, postulated in Sun Tzu’s The Art of War, dates back to the 6th century B.C. and has been widely used ever since. But, does it still apply in the world of information we live in? Let’s look at existing challenges of identifying your true enemies in the cyber realm.

The frequency of cyber attacks and major information leaks is growing year upon year, with the complexity and sophistication of attackers’ tools, techniques and procedures continually increasing. Some of those attacks affect critical infrastructure. As such, they pose risks to human lives and create unprecedented data leaks of confidential and classified information, revelations and mass disruptions. The ability to attribute such attacks to States, criminals or other groups or individuals can have an important deterrent effect. Reliable identification of the attacker’s origin may be used to prevent further attacks from happening—either through media exposure, public indictments or closed political negotiations. This is one of the reasons why nation States are highly interested in attribution of cyber attacks. Deep study of malware may reveal pointers to its creator, who normally works closely with the operators of the malware or is even part of the same group.

A typical cyber attack investigation starts from the discovery of a breach; followed by digital forensics (i.e., forensic analysis of virtual and digital assets); and then an incident response phase during which the attackers’ activity artefacts[1] are collected. These artefacts are analyzed and combined with all other pieces of available information to attribute the attack.

There are certain commonly seen classes of information which are used to understand the origin of the attack. Here are some of the key elements that will normally provide information needed to attribute cyber attacks.

Metadata

The metadata includes identifiers, preferences, timestamps, file system paths and other settings which were automatically copied from the system in which the malicious file was created and were included into the final attack tool. This information may be found in executable applications, documents, archives and other data formats. Sometimes this class of information may reveal the attacker’s native language, user name, organization’s domain name, operation’s codename, and so on. If the data is not a false flag, but was simply overlooked by the attackers, it can serve as a good attribution indicator.

User messages and other strings

Attackers are humans just like anyone else. From time to time, attackers include human-readable text into the code they used for cyber attacks. Careful linguistic analysis of the code may provide hints about the cultural background, knowledge and level of attackers’ skills.

Algorithms and code

Quite often, malware developers’ have distinct preferences to use certain custom algorithms or they repeatedly use their own unique implementation of a common algorithm. In other words, attackers tend to use the same code, or build new code based on past experience. This allows an investigator to link malware samples and families to one another and to campaigns from the past, which could be connected to reliably identify attackers.

Passwords and encryption keys

Attackers have obvious reasons to protect their malicious code from prying eyes and network security products, to hide the content of malware communication with command and control servers. In addition, they often use password (or key) authentication to prevent others from hijacking their backdoors. This is why they introduce passwords, encryption keys and other cryptographic artefacts. The majority of such artefacts are generated or selected randomly. However, at times, such artefacts are reused, even though the related algorithms were changed. Again, this enables linking new malicious codes to known attackers.

Infrastructure

Almost all attacks operated by nation States, criminals or other groups are carefully controlled and operated by a human. This requires maintaining infrastructure to host malware controllers (command and control servers) and entails additional components for anonymization—such as proxy servers and VPN services. This leads to another set of the attack operator’s preferences such as domains, IPs, hosting companies, DNS providers, VPN providers, software versions and other infrastructure specific choices, which may uniquely describe attackers and lead to discovery of the attacker’s origin.

If a cyber attack can be linked to a previous attack for which attribution was completed with an acceptable level of confidence, this link may answer the question of who is behind the more recent attack. But, what can you do if this is not possible—for example because the attacker is new or because an old attacker has started from scratch with new tools, new tactics and completely changed preferences?

Identifying the physical location of the attacker

Determining the physical location of an attacker is normally an important step to reliably establish his or her identity. Only once the chain of computers used in an attack has been traced all the way to the system where the user input was generated, can the identity of the attacker truly be verified. Often, this is not possible without the support of law enforcement agencies in the place where the attacker is suspected to be based, which is less likely to exist in a case of nation State-operated attacks. In such cases, nation States may even shift to leveraging retaliatory computer network exploitation collecting intelligence on who attacked them in the first place. Such activity fuels the ongoing international cyber arms race and encourages the creation of cyber-offensive units in State organizations. Private companies may also be helpful by collecting whatever traces are left from the attackers accessing popular internet services and infrastructure.

Similar to non-digital crime investigation processes, the discovered forensic evidence shall check out and describe the crime scene, often indicating particular suspect’s involvement in the act of crime. Assuming that the origin of the attack has been determined, all the artefacts have to be reviewed and grouped into the following categories:

  1. Matching the suspect’s identity with the attack
  2. Deliberate false flags created by the suspect
  3. Others (unexplained artefacts)

The challenges of false flags

The history of cyber threat intelligence is full of campaigns with carefully crafted false flags designed to misguide and derail researchers in their investigation, either to waste their time or to make them point fingers in the wrong direction. Indeed, because attackers will normally know which elements will allow an investigator to attribute an attack, the attacker will be inclined to include many false flags to render attribution as complicated as possible. A prominent case of a false flag campaign was discovered during the analysis of the network worm released at the Olympic Winter Games opening ceremony in Pyeongchang, South Korea in 2018. The malware was packed with various similarities to multiple other targeted attack groups previously known. This led to confusion across the infosec industry when it came to attribution of the attacks. However, what was most surprising was that the attackers managed to predict the behavior and thinking of investigators by including a false flag into an unusual metadata field that had never before been used for attribution. This means that the attackers take the attribution problem very seriously and not only occasionally drop typical false flags, but also drive their own counter-attribution research.

The unexplained artefacts

In cyber threat intelligence, a critical mass of unexplained artefacts normally indicates that something is not quite right, putting into question the quality of intelligence that lead to determination of the attackers’ physical location.

Attribution confusion in the past was caused by a fact that some attackers were compromised by other more powerful attackers, who pulled the strings in the shadow of less experienced groups. In some scenarios they reused compromised victims’ assets by deploying their own malware. In other scenarios, they tapped to already stolen data, which is known as a ‘fourth party collection’ technique.

***

Some say that determining even a minor clue on the possible physical location of the attackers is extremely difficult and rare—and they may be right. In fact, most of successful attribution identifications were based on identifying the links between new and previously known cyber attacks. In such cases the attribution quality is fully dependent on the skills, outlook and talents of the analyst, whose difficult role is to separate real attribution artefacts from the false flags prepared by the attackers. This is similar in a way to the game of chess: you can only win if you predict moves of your opponent. This is why entering the field of cyber threat intelligence and doing serious attribution requires continuous and intense trainings, many battles and years of experience. Before you can understand and know your enemy, it is crucially important to know yourself to not be misguided by someone who is more skillful than you.

***

Footnotes

[1] The term ‘artefact’ is widely used in computer forensics, though there is no official definition of this term. Artefact usually refers to an object of digital archaeological interest, where digital archaeology means digital forensics without the forensic (legal) context.

***

Other posts in the series

See also


DISCLAIMER: Posts and discussion on the Humanitarian Law & Policy blog may not be interpreted as positioning the ICRC in any way, nor does the blog’s content amount to formal policy or doctrine, unless specifically indicated.