Editor’s note: In this post, Tess Bridgeman continues the discussion on detention and the potential use of predictive algorithms in armed conflict settings, as part of the AI blog series.
She specifically looks at the requirement under IHL to only intern individuals when absolutely necessary and raises possible pitfalls around the use of predictive algorithms to make risk assessments.
***Are predictive algorithms helpful in determining who to detain and who continues to present a security threat in an armed conflict context? Or will use of these emerging technologies be irreconcilable with the requirements of international humanitarian law (IHL) in what are likely to be data poor, rapidly changing conflict settings? These are questions modern militaries will need to address as machine learning tools become more advanced and are used more widely by governments to make decisions about who to detain, and for how long, in modern conflicts.
In the armed conflict context, there are several ways in which statistical predictions about an individual could be integrated into decision-making about whether a person may be detained, or whether detention may continue. For example, the Fourth Geneva Convention provides for internment of protected civilians in international armed conflict if they are aliens in the territory of a party to the conflict ‘only if the security of the Detaining Power makes it absolutely necessary’. An occupying power may also intern protected persons if ‘necessary, for imperative reasons of security’. In either case, and among other safeguards, the internment must be periodically reviewed (at least every 6 months) to determine whether internment remains necessary. These reviews would ordinarily entail risk assessments regarding the degree of security threat, if any, posed by the individual, and whether their release would pose a threat to civilians or to the armed forces of the detaining or occupying power. (This is in contrast to status-based detention of members of regular armed forces until the cessation of active hostilities, for which no periodic review is required, and these threat evaluations are irrelevant.)
Review boards have also been used in non-international armed conflicts to determine whether detained individuals—for whom combatant status may be less clear than in international armed conflicts—continue to pose a significant security threat that cannot be mitigated other than through continued detention. Examples include Periodic Review Boards at Guantanamo Bay and earlier Administrative Review Boards in Afghanistan, although the United States has treated these processes as discretionary rather than required by IHL. These review processes, too, often involve an assessment of the risk posed by the detainee, if any, based on a range of factors about their prior behaviors, current circumstances and attitudes, and potential future interactions.
In her thoughtful article Predicting Enemies, Ashley Deeks analyzes potential military adaptations of predictive algorithms developed for domestic criminal detention-related purposes in armed conflicts settings. Given that militaries are exploring uses of these technologies, she is right to call our attention to the ways they might be used in assessing risk as they are imported into conflict contexts. Deeks raises several important questions about algorithmic bias (well-documented in the policing context), transparency of algorithmic decision-making and algorithmic accountability. Drawing from analogous uses of predictive technology in the domestic criminal justice context in the United States, she also explores ways in which these problems may be exacerbated in the armed conflict detention setting. But real questions also remain as to whether the use of such technologies is currently viable at all in these contexts.
The most analogous domestic criminal law situations to armed conflict detention may be those in which judges make risk assessments about an individual already in custody, such as in making decisions on bail or parole, to determine whether (or under what conditions) they should be released. While we would be remiss not to draw on experience in the law enforcement context as we evaluate potential military applications of predictive technologies, we must also be mindful that for a whole host of reasons, the analogy is far from perfect (Suresh Venkatasubramian and his co-authors refer to ignoring the context that makes a model work as the ‘portability trap’ in their recent paper, ‘Fairness and Abstraction in Sociotechnical Systems’).
Predictive algorithms in the domestic criminal context are designed to predict outcomes based on analysis of large amounts of data, often collected in the same geographic area, over time. They work best when we understand the context fully, can carefully select which features (or variables) to include in a model, and what biases they may be importing. Do any of these circumstances hold in the armed conflict context?
The issues of limited and unstable data in a big data model
We know that algorithms encode biases, norms and values. The problems of algorithmic bias and non-transparency have been rightly identified as major issues in the criminal justice context and would likely be even bigger problems in the conflict context. But before we even get to those issues, we first need to ask whether it’s possible to build a usable predictive model at all in the far messier context of an ongoing armed conflict.
At bottom, a predictive algorithm uses historical data to predict future events. The historical data is used to build a statistical model with ‘features’ (known as variables in statistics), and the model is then run on current data to make predictions. For supervised machine learning, the model that has been constructed yields a correct result about a new individual (within a specified margin of error) if the test data is drawn from the same distribution of data on which the model was trained. Generally, the test data involves millions of data points, if not more, providing a robust base of information on which the algorithm might learn.
For this type of machine learning tool, data drawn from a changing and complex or poorly understood environment, in which the historical data is itself unstable over time or otherwise unreliable, would generally not be appropriate to train a predictive model that could be expected to yield a tolerable amount of error in the future. If we ignore this issue, we can expect our algorithm to systematically generate the wrong answers.
Start with data collection. The process of building and training a model generally requires a relatively large amount of reliable and well-understood data. Will a detaining or occupying power be able to gather large enough amounts of accurate data during hostilities or occupation, and ensure that it is properly coded? Given that even in the domestic context some communities are wary of providing information to authorities, we can imagine how much harder this is when the information needed is from those on the other side of an armed conflict (we can also imagine a detaining power would not likely have access to existing sources of data on the relevant communities, should there be any worth using).
Next, choosing features (or variables) will also present difficulty in a fast-moving environment. Intelligence regarding indicators of threat could change rapidly, or alliances could shift rendering earlier indicators unreliable as they could point to the wrong people altogether. Will it be possible for those creating a predictive risk assessment algorithm to know what features should be used in a given socio-cultural context, or how they should be weighted?
Additionally, the process of data collection, training and validation of a model takes time, expertise, and resources, raising questions of whether a time-consuming and resource-intensive process is appropriate or useful in a conflict environment. In order to even begin building a model, we would need to have some number of people apprehended already, understand whether the detaining authority were correct or incorrect in having apprehended them (or whether they do or don’t present a security threat), and understand a set of characteristics about them that could be generalized to the broader population.
Finally, if it is the case that a model developed in a data-poor, fast-changing environment will likely be imperfect at best, error rates can be expected to be high. In some contexts, that might be a tolerable result. But it’s a large—and potentially insurmountable—problem in the context of making weighty decisions on liberty and security. Particularly where IHL requires that continued internment is permissible only when absolutely necessary/or for imperative reasons of security, deferring too much to a predictive model that is likely to generate inaccurate results about whether or the degree to which an individual poses a security risk could be seen as inconsistent with a State’s IHL obligations.
These potential pitfalls of using predictive algorithms are raised not to dismiss the idea that they could ever be useful as a component of detention-related decision-making in a conflict environment. Rather, they are intended to raise two fundamental questions. First, given the likely limitations and obstacles presented in armed conflict contexts, are we capable of constructing predictive algorithmic tools that generate accurate enough results such that they can be relied on in a manner that would be consistent with a State’s IHL obligations?
If so, the second question is whether pursuing this route, based on existing technology, is worth the tremendous amount of time and resources it would take to collect the necessary data, ensure its accuracy and create the predictive tools to guide decision-making discretion. Put simply—would it actually be more accurate, cheaper or faster to use these technologies than to rely on existing decision-making systems, despite their own imperfections? If not, these technologies may still be in search of a solvable problem for militaries that might employ them.
This post is part of the AI blog series, stemming from the December 2018 workshop on Artificial Intelligence at the Frontiers of International Law concerning Armed Conflict held at Harvard Law School, co-sponsored by the Harvard Law School Program on International Law and Armed Conflict, the International Committee of the Red Cross Regional Delegation for the United States and Canada and the Stockton Center for International Law, U.S. Naval War College.
Other blog posts in the series include
- Intro to series and Expert views on the frontiers of artificial intelligence and conflict
- Dustin Lewis, Legal reviews of weapons, means and methods of warfare involving artificial intelligence: 16 elements to consider
- Lorna McGregor, The need for clear governance frameworks on predictive algorithms in military settings
- Suresh Venkatasubramanian, Structural disconnects between algorithmic decision making and the law
- Li Qiang and Xie Dan, Legal regulation of AI weapons under international humanitarian law: A Chinese perspective
- Netta Goussac, Safety net or tangled web: Legal reviews of AI in weapons and war-fighting
- ICRC, Artificial intelligence and machine learning in armed conflict: A human-centred approach, June 6, 2019
Previous posts by workshop participants
- Merel Ekelhof, Autonomous weapons: Operationalizing meaningful human control, August 15, 2018
- Eric Talbot Jensen, The human nature of international humanitarian law, August 23, 2018
- ICRC, Neil Davison, Autonomous weapon systems: An ethical basis for human control? April 3, 2018
For more posts, see our Autonomous Weapons Series
DISCLAIMER: Posts and discussion on the Humanitarian Law & Policy blog may not be interpreted as positioning the ICRC in any way, nor does the blog’s content amount to formal policy or doctrine, unless specifically indicated.