Editor’s note: As part of this AI blog series, several posts focus on detention and the potential use of predictive algorithms to assist in decision-making in armed conflict settings. Starting off the discussion is Ashley Deeks.

***

Militaries may soon begin to develop and deploy predictive algorithms for use during armed conflicts to help them assess which actors are dangerous for purposes of detention and where future attacks are likely to occur for purposes of patrolling and targeting. The U.S. criminal justice system has already turned to predictive algorithms to help it make more objective judgments about who to keep in custody and more efficient decisions about where to deploy police resources. In a recent article called Predicting Enemies, I wrote about this possibility and discussed the parallels between goals such as these on the military side and those of the U.S. criminal justice system. Here, I build upon that article, highlighting important additional considerations that militaries should weigh as they evaluate how predictive algorithms can help them perform their missions.

***

In my longer article, I suggest that the focus within the Chinese and U.S. militaries on expanding their artificial intelligence and machine learning capabilities means that tools such as those used in the criminal justice system are likely to appear in the military setting soon.  (In fact, it is possible that militaries are using them already in classified settings.) Criminal justice system algorithms have come under a variety of critiques, and my article argues that the U.S. military can learn important lessons from these critiques as it develops comparable tools for the armed conflict setting. These critiques include concerns that the algorithms rely on biased data, that insufficient transparency surrounds their creation and use, and that their users will suffer from excessive automation bias (that is, an undue willingness to accept a system’s recommendations).

Since I wrote the article, I have had several conversations with others who are thinking about these issues (including military officials, computer scientists, and those focused on individual rights). These conversations have highlighted some important additional considerations that militaries should weigh as they evaluate how predictive algorithms can help them perform their detention, patrolling, and targeting missions. The conversations also fleshed out some additional reasons to be cautious about transporting the kinds of algorithms developed in the criminal justice setting into a military context. This post identifies and discusses these additional considerations and provides some initial thoughts about how to meet these challenges.

First, militaries should be cautious in translating concepts from the criminal justice context—such as ‘dangerousness’ or ‘threat’—into the military context because the two settings are so different. Suresh Venkatasubramanian and his co-authors have described this as a ‘portability trap’, which they define as a ‘failure to understand how repurposing algorithmic solutions designed for one social context may be misleading, inaccurate, or otherwise do harm when applied to a different context’.

In the context at hand, a portability trap might arise if the computer scientists building an algorithm to predict individual dangerousness in the context of an armed conflict decide to employ the same kinds of factors that U.S. criminal justice algorithms exploit to determine whether a criminal defendant is likely to commit additional offenses in the future. The data upon which computer scientists base the criminal justice algorithms is drawn from our own culture, which we understand reasonably well: federal and State law enforcement officials know what constitutes criminal behavior. Further, sentencing, bail and parole algorithms are trained on relatively objective and confirmable data (such as age, marital status, employment status and family background). Past convictions—which often serve as an important factor in assessing dangerousness—arise after the government has produced significant evidence that the person committed the offense. Even past arrests—another factor relevant to the algorithms—must be based on probable cause.

In contrast, although the United States military is attuned to the need for cross-cultural competence and trains its forces accordingly, it is a difficult task. Militaries will need to work hard and carefully to understand what data to use in foreign settings to develop reliable detention algorithms—and will need to ensure that their computer scientists are cross-culturally trained as well. Further, the military will need to train its algorithms on the data of people who constitute ‘threats’ and on those who constitute ‘non-threats’, but that data is less likely to be tested as rigorously as criminal convictions are. In short, although at a high level of generality the concepts that undergird the criminal justice algorithms may well translate into military use, there are several ways in which the criminal justice algorithms are not easily portable.  Further, if the military seeks to import algorithmic concepts from the criminal justice setting, it should be attuned to the optics of deploying a category of algorithms that have come under some sharp critiques in their original setting.

Second, and relatedly, data matters—a lot. It is not possible to produce a reliable predictive algorithm without high quality, reliable data. The U.S. military may have vast realms of data, but informal conversations with military officers suggest that it is spread across a host of systems, presumably in a range of formats. If the military is committed to developing advanced AI and machine learning systems moving forward—whether for use in detention, targeting or other operations—it should get serious now about preserving all relevant data in a useable form for future algorithms. (The Executive Order on AI and the U.S. Defense Department’s artificial intelligence strategy, both issued in February 2019, suggest that the Defense Department will indeed become focused on this.) In detention, the military also should keep in mind that law enforcement data may also be both relevant and valuable, such as where a State pursues a criminal case after detaining someone and, in the process, accrues more data on the person’s behavior. Further, the United States military may also want to draw not only from its own databases but also from that of its allies. As NATO consolidates its data centers, NATO members should think not only about how shared data could improve the quality of algorithms, but also how one State’s concern about the use to which its data eventually may be put (i.e., to help train a detention or targeting algorithm) could hinder that consolidation.

Third, several people have argued to me that detention algorithms are fundamentally unfair because they make recommendations based on what others have done, not what the person under consideration himself has done. At one level, this is true. These algorithms predict how likely it is that someone with (say) a set of eight characteristics is likely to engage in dangerous behavior if released. That prediction is based on the behavior of others, not the individual’s own (future) behavior. One judge argued to me that this kind of approach is unfair, and noted that he takes into account only the specific characteristics of the person in front of him when he imposes a sentence. This critique is also relevant to security detention during armed conflict, because IHL provides that a State may only detain a person based on her individual activities (and may not detain people as a form of collective punishment).

There are at least three arguments that cut against this concern. First, if, after testing, the algorithm proves that it more reliably predicts what people with those eight characteristics will do, we might still wish to use it on the person under consideration, even if our decision is based on statistical probabilities. Indeed, the algorithm’s recommendation may well be that the judge should release the person as being low-risk, or that the military officer should release the person because she does not pose an imperative threat to security. Not all recommendations will be to continue to detain. Second, I anticipate that military officials would use these algorithms to help guide their decisions, but not allow the algorithm to make the decision for them. This would allow external factors (a detainee’s remorse, say) to remain relevant to the ultimate determination. Of course, there still may be a concern about ‘automation bias’”—the idea that people rely heavily on machine recommendations even when their personal experience suggests a different answer—and my article suggests that the military needs to be attuned to this bias.  Third, even when judges say they consider only the characteristics of the person standing before them, many judges surely still implicitly import their past experiences with other defendants who had similar characteristics. That is, the judges use their own ‘algorithms’. There is something intuitively troubling about having someone rely on what others have done to predict what you yourself will do. But, in my view this objection should not bring to a halt all developments in this area.

Finally, the military should be clear about the policy goals and parameters of any predictive algorithm it develops to inform detention decisions. What is the military’s tolerance for risk in releasing or retaining detainees? What level of false positives will its chosen algorithm produce and will those false positives hinder its counter-insurgency or other military goals? Can it be sure that its algorithms help it comply with IHL? Must any algorithm it deploys be more accurate than human predictions? How will the military make the human/algorithmic comparison? Should the military attach default assumptions to particular ‘prediction’ scores, such that it will release detainees who receive a ‘low threat’ algorithmic ranking unless an official makes a compelling case to the contrary? Although these are difficult questions, the process of crafting the algorithm could help the military clarify its policy goals for detention and its interpretation of international legal standards.

Even though many feel uneasy about autonomy, artificial intelligence, and machine learning in war, the Pentagon and its advanced research arm, the Defense Advanced Research Projects Agency (DARPA), are pressing ahead to expand the U.S. use of these tools. One of DARPA’s new programs is ‘trying to determine what the adversary is trying to do, his intent; and once we understand that . . . then identify how he’s going to carry out his plans—what the timing will be, and what actors will be used’. If, as seems possible, the U.S. military is already, or soon will be, contemplating the use of predictive algorithms for detention, targeting and other operations, it should be thinking now about how to address considerations such as those identified here.

***

Editor’s note

This post is part of the AI blog series, stemming from the December 2018 workshop on Artificial Intelligence at the Frontiers of International Law concerning Armed Conflict held at Harvard Law School, co-sponsored by the Harvard Law School Program on International Law and Armed Conflict, the International Committee of the Red Cross Regional Delegation for the United States and Canada and the Stockton Center for International Law, U.S. Naval War College.

Other blog posts in the series include

Previous posts by workshop participants

For more posts, see our past Autonomous Weapons Series


DISCLAIMER: Posts and discussion on the Humanitarian Law & Policy blog may not be interpreted as positioning the ICRC in any way, nor does the blog’s content amount to formal policy or doctrine, unless specifically indicated.