Editor’s note: As part of this AI blog series, several posts focus on detention and the potential use of predictive algorithms to assist in decision-making in armed conflict settings. Starting off the discussion is Ashley Deeks.
***
Militaries may soon begin to develop and deploy predictive algorithms for use during armed conflicts to help them assess which actors are dangerous for purposes of detention and where future attacks are likely to occur for purposes of patrolling and targeting. The U.S. criminal justice system has already turned to predictive algorithms to help it make more objective judgments about who to keep in custody and more efficient decisions about where to deploy police resources. In a recent article called Predicting Enemies, I wrote about this possibility and discussed the parallels between goals such as these on the military side and those of the U.S. criminal justice system. Here, I build upon that article, highlighting important additional considerations that militaries should weigh as they evaluate how predictive algorithms can help them perform their missions.***
In my longer article, I suggest that the focus within the Chinese and U.S. militaries on expanding their artificial intelligence and machine learning capabilities means that tools such as those used in the criminal justice system are likely to appear in the military setting soon. (In fact, it is possible that militaries are using them already in classified settings.) Criminal justice system algorithms have come under a variety of critiques, and my article argues that the U.S. military can learn important lessons from these critiques as it develops comparable tools for the armed conflict setting. These critiques include concerns that the algorithms rely on biased data, that insufficient transparency surrounds their creation and use, and that their users will suffer from excessive automation bias (that is, an undue willingness to accept a system’s recommendations).
Since I wrote the article, I have had several conversations with others who are thinking about these issues (including military officials, computer scientists, and those focused on individual rights). These conversations have highlighted some important additional considerations that militaries should weigh as they evaluate how predictive algorithms can help them perform their detention, patrolling, and targeting missions. The conversations also fleshed out some additional reasons to be cautious about transporting the kinds of algorithms developed in the criminal justice setting into a military context. This post identifies and discusses these additional considerations and provides some initial thoughts about how to meet these challenges.
First, militaries should be cautious in translating concepts from the criminal justice context—such as ‘dangerousness’ or ‘threat’—into the military context because the two settings are so different. Suresh Venkatasubramanian and his co-authors have described this as a ‘portability trap’, which they define as a ‘failure to understand how repurposing algorithmic solutions designed for one social context may be misleading, inaccurate, or otherwise do harm when applied to a different context’.
In the context at hand, a portability trap might arise if the computer scientists building an algorithm to predict individual dangerousness in the context of an armed conflict decide to employ the same kinds of factors that U.S. criminal justice algorithms exploit to determine whether a criminal defendant is likely to commit additional offenses in the future. The data upon which computer scientists base the criminal justice algorithms is drawn from our own culture, which we understand reasonably well: federal and State law enforcement officials know what constitutes criminal behavior. Further, sentencing, bail and parole algorithms are trained on relatively objective and confirmable data (such as age, marital status, employment status and family background). Past convictions—which often serve as an important factor in assessing dangerousness—arise after the government has produced significant evidence that the person committed the offense. Even past arrests—another factor relevant to the algorithms—must be based on probable cause.
In contrast, although the United States military is attuned to the need for cross-cultural competence and trains its forces accordingly, it is a difficult task. Militaries will need to work hard and carefully to understand what data to use in foreign settings to develop reliable detention algorithms—and will need to ensure that their computer scientists are cross-culturally trained as well. Further, the military will need to train its algorithms on the data of people who constitute ‘threats’ and on those who constitute ‘non-threats’, but that data is less likely to be tested as rigorously as criminal convictions are. In short, although at a high level of generality the concepts that undergird the criminal justice algorithms may well translate into military use, there are several ways in which the criminal justice algorithms are not easily portable. Further, if the military seeks to import algorithmic concepts from the criminal justice setting, it should be attuned to the optics of deploying a category of algorithms that have come under some sharp critiques in their original setting.
Second, and relatedly, data matters—a lot. It is not possible to produce a reliable predictive algorithm without high quality, reliable data. The U.S. military may have vast realms of data, but informal conversations with military officers suggest that it is spread across a host of systems, presumably in a range of formats. If the military is committed to developing advanced AI and machine learning systems moving forward—whether for use in detention, targeting or other operations—it should get serious now about preserving all relevant data in a useable form for future algorithms. (The Executive Order on AI and the U.S. Defense Department’s artificial intelligence strategy, both issued in February 2019, suggest that the Defense Department will indeed become focused on this.) In detention, the military also should keep in mind that law enforcement data may also be both relevant and valuable, such as where a State pursues a criminal case after detaining someone and, in the process, accrues more data on the person’s behavior. Further, the United States military may also want to draw not only from its own databases but also from that of its allies. As NATO consolidates its data centers, NATO members should think not only about how shared data could improve the quality of algorithms, but also how one State’s concern about the use to which its data eventually may be put (i.e., to help train a detention or targeting algorithm) could hinder that consolidation.
Third, several people have argued to me that detention algorithms are fundamentally unfair because they make recommendations based on what others have done, not what the person under consideration himself has done. At one level, this is true. These algorithms predict how likely it is that someone with (say) a set of eight characteristics is likely to engage in dangerous behavior if released. That prediction is based on the behavior of others, not the individual’s own (future) behavior. One judge argued to me that this kind of approach is unfair, and noted that he takes into account only the specific characteristics of the person in front of him when he imposes a sentence. This critique is also relevant to security detention during armed conflict, because IHL provides that a State may only detain a person based on her individual activities (and may not detain people as a form of collective punishment).
There are at least three arguments that cut against this concern. First, if, after testing, the algorithm proves that it more reliably predicts what people with those eight characteristics will do, we might still wish to use it on the person under consideration, even if our decision is based on statistical probabilities. Indeed, the algorithm’s recommendation may well be that the judge should release the person as being low-risk, or that the military officer should release the person because she does not pose an imperative threat to security. Not all recommendations will be to continue to detain. Second, I anticipate that military officials would use these algorithms to help guide their decisions, but not allow the algorithm to make the decision for them. This would allow external factors (a detainee’s remorse, say) to remain relevant to the ultimate determination. Of course, there still may be a concern about ‘automation bias’”—the idea that people rely heavily on machine recommendations even when their personal experience suggests a different answer—and my article suggests that the military needs to be attuned to this bias. Third, even when judges say they consider only the characteristics of the person standing before them, many judges surely still implicitly import their past experiences with other defendants who had similar characteristics. That is, the judges use their own ‘algorithms’. There is something intuitively troubling about having someone rely on what others have done to predict what you yourself will do. But, in my view this objection should not bring to a halt all developments in this area.
Finally, the military should be clear about the policy goals and parameters of any predictive algorithm it develops to inform detention decisions. What is the military’s tolerance for risk in releasing or retaining detainees? What level of false positives will its chosen algorithm produce and will those false positives hinder its counter-insurgency or other military goals? Can it be sure that its algorithms help it comply with IHL? Must any algorithm it deploys be more accurate than human predictions? How will the military make the human/algorithmic comparison? Should the military attach default assumptions to particular ‘prediction’ scores, such that it will release detainees who receive a ‘low threat’ algorithmic ranking unless an official makes a compelling case to the contrary? Although these are difficult questions, the process of crafting the algorithm could help the military clarify its policy goals for detention and its interpretation of international legal standards.
Even though many feel uneasy about autonomy, artificial intelligence, and machine learning in war, the Pentagon and its advanced research arm, the Defense Advanced Research Projects Agency (DARPA), are pressing ahead to expand the U.S. use of these tools. One of DARPA’s new programs is ‘trying to determine what the adversary is trying to do, his intent; and once we understand that . . . then identify how he’s going to carry out his plans—what the timing will be, and what actors will be used’. If, as seems possible, the U.S. military is already, or soon will be, contemplating the use of predictive algorithms for detention, targeting and other operations, it should be thinking now about how to address considerations such as those identified here.
***
Editor’s note
This post is part of the AI blog series, stemming from the December 2018 workshop on Artificial Intelligence at the Frontiers of International Law concerning Armed Conflict held at Harvard Law School, co-sponsored by the Harvard Law School Program on International Law and Armed Conflict, the International Committee of the Red Cross Regional Delegation for the United States and Canada and the Stockton Center for International Law, U.S. Naval War College.
Other blog posts in the series include
- Intro to series and Expert views on the frontiers of artificial intelligence and conflict
- Dustin Lewis, Legal reviews of weapons, means and methods of warfare involving artificial intelligence: 16 elements to consider
- Lorna McGregor, The need for clear governance frameworks on predictive algorithms in military settings
- Tess Bridgeman, The viability of data-reliant predictive systems in armed conflict detention
- Suresh Venkatasubramanian, Structural disconnects between algorithmic decision making and the law
- Li Qiang and Xie Dan, Legal regulation of AI weapons under international humanitarian law: A Chinese perspective
- Netta Goussac, Safety net or tangled web: Legal reviews of AI in weapons and war-fighting
See also
- ICRC, Artificial intelligence and machine learning in armed conflict: A human-centred approach, June 6, 2019
Previous posts by workshop participants
- Merel Ekelhof, Autonomous weapons: Operationalizing meaningful human control, August 15, 2018
- Eric Talbot Jensen, The human nature of international humanitarian law, August 23, 2018
- ICRC, Neil Davison, Autonomous weapon systems: An ethical basis for human control? April 3, 2018
For more posts, see our past Autonomous Weapons Series
DISCLAIMER: Posts and discussion on the Humanitarian Law & Policy blog may not be interpreted as positioning the ICRC in any way, nor does the blog’s content amount to formal policy or doctrine, unless specifically indicated.
DARPA expanding the use of these unproven tools into areas such as, for instance, forecasting enemy battlefield plans will ultimately impact on targeting and weapon choice decision-inevitably multiplying the already unacceptable numbers of civilian deaths and injuries. When the UK MoD recently declared there was only one civilian death resulting from their air strikes in Syria it is easy to see how the addition of multiple layers of algorithmic calculations from identifying who the enemy is, to what weapon will kill that enemy most efficiently, could very quickly dispense with any considerations of discrimination and proportionality … the answer will simply be ‘we know’.
Is the article by Lorna McGregor a future article, ie this is why it is not yet available? : )
Yes, it is upcoming, as is the post by Tess Bridgeman. We have published three articles in the series so far and are rolling out the others this week and in the following weeks.